arXiv:1504.03987v2 [math.PR] 27 Jul 2015 


RANDOM LAPLACIAN MATRICES AND CONVEX 
RELAXATIONS 

AFONSO S. BANDEIRA 


Abstract. The largest eigenvalue of a matrix is always larger or equal than 
its largest diagonal entry. We show that for a large class of random Laplacian 
matrices, this bound is essentially tight: the largest eigenvalue is, up to lower 
order terms, often the size of the largest diagonal entry. 

Besides being a simple tool to obtain precise estimates on the largest eigen¬ 
value of a large class of random Laplacian matrices, our main result set¬ 
tles a number of open problems related to the tightness of certain convex 
relaxation-based algorithms. It easily implies the optimality of the semidefi- 
nite relaxation approaches to problems such as Z 2 Synchronization and Sto¬ 
chastic Block Model recovery. Interestingly, this result readily implies the 
connectivity threshold for Erdos—Renyi graphs and suggests that these three 
phenomena are manifestations of the same underlying principle. The main 
tool is a recent estimate on the spectral norm of matrices with independent 
entries by van Handel and the author. 


1 . Introduction 

Towards the end of the 1950s, Eugene Wigner [48] made the remarkable finding 
that the spectrum of a large class of random matrices is, in high dimension, dis¬ 
tributed essentially the same way: under mild assumptions, the distribution of the 
spectrum converges to the so-called Wigner semicircle law. The study of spectral 
properties of random matrices has since spawned a panoply of fascinating research 
with important implications in many areas. We refer the reader to the books [43, 6] 
for more on this subject. 

The present paper addresses the problem of estimating the largest eigenvalue of 
a large class of Laplacian matrices. The investigation of such problems has strong 
motivations from algorithmic analysis. Indeed, the performance of many popular 
algorithms is tightly connected with the largest eigenvalue of some matrix that 
depends on its input, and so studying the performance of such algorithms over 
random inputs involves understanding the behavior of the largest eigenvalue of a 
random matrix. In fact, as we will see, the estimates derived here play a crucial role 
in understanding the typical performance of a natural semidefinite programming- 
based approach for solving certain computationally hard problems on graphs, such 
as community detection. 

We use the term Laplacian matrix to refer to symmetric matrices whose rows 
and columns sum to zero. While oftentimes Laplacians are also thought of as 
being positive semidehnite, the matrices we will treat will not necessarily satisfy 
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this property. Spectral graph theory inspires a useful way of thinking about these 
matrices [18]. Given a graph on n nodes with edge set E, its adjacency matrix 
A € is defined by Aij = 1 if (i,j) € E and Aij = 0 otherwise, and its degree 

matrix Da is a diagonal matrix whose z-th diagonal entry is equal to the degree of 
node i. The Laplacian of the graph is defined to be La — Da — A. The spectrum 
of the graph Laplacian matrix is known to contain important information about 
the graph [18], and has been studied for random graphs [22, 17, 14[. Analogously, 
we make the following definition. 

Definition 1.1. Given a symmetric matrix X G R"^", we define the Laplacian 
Lx of X as 

Lx = Dx — X, 

where Dx is the diagonal matrix whose diagonal entries are given by 

n 

We will refer to any such matrix Lx os a Laplacian matrix. Note that these are 
precisely the symmetric matrices L for which LI = 0, where 1 S R" denotes the 
all-ones vector. 

This paper is concerned with a class of random Laplacian matrices Lx where the 
entries of the matrix X are independent centered (but not necessarily identically 
distributed) random variables. Our main result is that, under mild and easily 
verifiable conditions, the largest eigenvalue of Lx is, up to lower order terms, given 
by its largest diagonal entry. While we defer the formal statement of our main 
results ^ to Section 3, we informaly state them here. 

Informal Statement of Theorem (3.1). Let L be an n x n symmetric random 
Laplacian matrix (i.e. satisfying LI = 0) with centered independent off-diagonal 
entries such that X]jG[n]\i ®equal for every i, and 

- ™ax||Lij |[^logn. 

ieHV 

Then, with high probability, 

Amax(L) - maxLij < (logn)“5 max L^i. 

i i 

Not only does our main result provide an extremely simple tool to precisely 
estimate the largest eigenvalue of Laplacian matrices, but in the applications studied 
below, the largest diagonal value also enjoys an interpretation that is intimately tied 
to the underlying problem. 

To illustrate the latter point, we turn back to graph theory. It is well known 
that the spectrum of the Laplacian of a graph dictates whether or not the graph 
is connected. On the other hand, its diagonal is simply given by the degrees of 
the nodes of the graph. A relation between the spectrum of the Laplacian and its 
diagonal could then translate into a relation between degrees of nodes of a graph 
and its connectivity. In fact, such a relation is already known to exist: The phase 


^Our results will be of nonasymptotic nature (we refer the interested reader to |47] for a tutorial 
on nonasymptotic estimates in random matrix theory). 
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transition for connectivity of Erdos-Renyi graphs^ coincides with the one for the 
existence of isolated nodes. While it is true that any graph with an isolated node (a 
node with degree zero) cannot be connected, the converse is far from true, rendering 
this phenomenon particularly interesting. In Section 4.1, we will use our main result 
to provide a simple and illustrative proof of this phenomenon. 

We will use our main result to give sharp guarantees for certain algorithms that 
solve the I 2 Synchronization problem and the community detection problem in the 
Stochastic Block Model. The Z 2 Synchronization problem consists of recovering 
binary labels = ±1 associated with nodes of a graph from noisy (pairwise) mea¬ 
surements of XiXj whenever {i,j) is an edge of the graph (see [42]). This problem 
is intimately related to correlation clustering [11]. Despite its hardness, spectral 
methods and semidefinite programming-based methods are known to perform well 
in both the worst-case [9] and average-case settings [1, 2, 19].^ 

Community detection, or clustering, in a graph is a central problem in countless 
applications. Unfortunately, even the simplified version of partitioning a graph into 
two vertex sets, with the same size, that minimize the number of edges across the 
partition, referred to as minimum bisection, is known to be NP-hard. Nevertheless, 
certain heuristics are known to work well for typical realizations of random graph 
models that exhibit community structure [35, 12, 25] . In this setting, a particularly 
popular model is the Stochastic Block Model with two communities. 

Definition 1.2. (Stochastic Block Model with two communities) Given n even, and 
0 < p, ? < 1, we say that a random graph G is drawn from S(n,p, q), the Stochastic 
Block Model with two communities, if G has n nodes, divided in two clusters of ^ 
nodes each, and for each pair of vertices i,j, {i,j) is an edge of G with probability 
p if i and j are in the same cluster and with probability q otherwise, independently 
from any other edge. 

We will focus on the setting p > q. The problem of recovering, from a realization 
G ^ 5(n,p,q), the original partition of the underlying vertices gained popularity 
when Decelle et al. [21] conjectured a fascinating phase transition in the constant 
average-degree regime. More precisely, if p = ^ and q = ^ with a > b constants, it 
was conjectured that as long as 

(a — b)^ > 2(a -I- b), 

it is possible to make an estimate of the original partition that correlates with 
the true partition, and that below this threshold it is impossible to do so. This 
conjecture was later proven in a remarkable series of works by Mossel et al. [38, 37] 
and Massoulie [34] . Instead of settling for an estimate that correlates with the true 
partition, we will focus on exactly recovering the partition. A phase transition for 
this problem was established by Abbe et al. [3] and independently by Mossel et 
al. [36]. We will show that a certain semidefinite programming based-algorithm 
succeeds up to the information theoretical threshold, thus settling a problem posed 
in [3]. We remark that, while the present paper was being written, it was brought to 
our attention that this problem was also solved independently by parallel research 
efforts of Hajek et al. [28]. 

The use of semidefinite relaxations in combinatorial optimization dates back to 
the late 1970s with the seminal work of Laszlo Lovasz [32] in the so-called Lovdsz 


^The Erdos-Renyi model for random graphs will be discussed in more detail in Section 4.1. 
^The information-theoretic limits of this problem have also been investigated [1, 2, 15, 16). 
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theta function, this approach was shortly after made algorithmic in [27]. In the 
first half of the 1990s, interior point methods were adapted to solve semidehnite 
programs [-5, 39], providing reasonably efficient methods to solve these problems. In 
1995, Goemans and Williamson, devised the first approximation algorithm based 
on semidehnite programming ]26]. Their algorithm gave the best known approxi¬ 
mation ratio to the Max-Cut problem. Ever since, many approximation algorithms 
have been designed based on semidehnite programming. In fact, the algorithm we 
will analyze is greatly inspired by the semidehnite relaxation in ]26]. Remarkably, 
an important conjecture of Khot ]30] is known to imply that for a large class of 
problems including Max-Cut, this approach produces optimal approximation ra¬ 
tios ]40]. 

An approximation ratio is a guarantee that, for any possible instance of the input, 
the algorithm outputs a solution whose performance is at least a certain fraction 
(the approximation ratio) of the optimal one. The worst-case nature of this type of 
guarantee is often pessimistic. A popular alternative is to equip the input with a 
distribution (such as, for example, the Stochastic Block Model) and give guarantees 
for most inputs. More precisely, we will be interested in understanding when is it the 
case that the semidehnite relaxation approach gives exactly the correct answer (for 
most inputs). The tendency for a large class of semidehnite relaxations to be tight^ 
has been observed and conjectured, for example, in [8]. One of the main insights of 
this paper is the fact that the phenomenon described by our main result provides 
a unifying principle for understanding the tightness of many convex relaxations. 

1.1. Notation. We will make use of several standard matrix and probability nota¬ 
tions. For M a matrix we will denote its fc-th smallest eigenvalue by Afe(M), largest 
eigenvalue by Aniax(Al), and its spectral norm by ||M|]. diag(M) will be used to 
refer to a vector with the diagonal elements of M as entries. For x G M" a vector, 
diag(x) will denote a diagonal matrix D G with Du = Xi. 

1 will denote the all-ones vector, whenever there is no risk of ambiguity for its 
dimension. 

For a scalar random variable Y, we will write its p-norm as ||T||p = (E 
and inhnity norm as |lT|]oo = inf {a : |R| < a a. s.}. 

Given a graph, deg(*) will be used to denote the degree of node i. In the case 
of the Stochastic Block Model, degj„(i) will be used for inner-cluster degree and 
dego„t(i) for outer-cluster degree. 

We will say that an event £ happens with high probability when 

P]£] = l-n-^(i), 

where n is an underlying parameter that is thought of going to infinity (such as the 
dimension of the matrices or the number of nodes in the graphs being studied). 

2. A SIMPLER PROBLEM: Z 2 SYNCHRONIZATION WITH GAUSSIAN NOISE 

Before presenting our main results in Section 3, we will motivate them through 
a simplified version of the problems of Z 2 Synchronization and recovery in the 
Stochastic Block Model: given a noise level a and a vector z G {±1}" suppose we 


^When the optimal solution of a semidehnite relaxation is the optimal solution of the original 
problem we say that the relaxation is tight. 
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are given noisy measurements 

Yij = ZiZj + aWij, 

for each pair {i,j), where Wij are i.i.d. standard gaussian random variables (with 
Wij = Wji). A version of this problem, over the complex numbers, is treated 
in [7]. Our objective is to devise an algorithm that recovers the correct z with high 
probability. By definition, the maximum a posteriori (MAP) estimator maximizes 
the probability of recovering the correct variable 0 . Given that we have no a priori 
information on z we assume a uniform prior, in that case the MAP estimator 
coincides with the Maximum Likelihood Estimator (MLE) for z. The latter is the 
solution of 

max x^Yx 

s.t. x G R" (1) 

xi = 1 , 

which is referred to as the little Grothendieck problem over R and known to be 
NP-hard in general. In fact, (1) includes the Max-Cut problem by taking Y to be 
the Laplacian of a graph. In the spirit of the relaxation proposed in [26] for the 
Max-Cut problem, we take X = xx'^ and rewrite (1) as 

max Tr(FAi) 

s.t. Xii = 1 

a: ^ 0 ^ ^ 

rank(A') = 1. 

We now relax the nonconvex rank constraint and arrive at the following semidefi- 
nite program, which can be solved in polynomial time up to arbitrary precision [46] . 

max Tr(yAr) 

s.t. Xu = 1 (3) 

a: ^ 0. 

As it will be clear in the proceeding sections, this relaxation is also used to 
solve Z 2 Synchronization and recovery in the Stochastic Block Model, albeit for a 
different coefficient matrix Y. 

In what follows we will derive conditions for when a certain rank 1 matrix is the 
unique optimal solution of (3). Note that if A" = xx"’" is the unique solution to (3), 
then X must be the solution to (1), meaning that we are able to compute the MLE 
efficiently by solving (3). This motivates us to understand when is it the case that 
X = xx"^ is the unique optimal solution of (3). A fruitful way of approaching this 
relies on duality. The dual of (3) is given by: 

min Tr(ZI) 

s.t. D is diagonal (4) 

D-YYQ. 

Weak duality guarantees that if X and D are feasible solutions of respectively 
(3) and (4) then Tr(yAi) < Tr(ZI). Indeed, since X and D — Y are both positive 
semidefinite, we must have 

0<Tt[{D-Y)X]=Tt{D)-Tt{YX). (5) 

This means that if we are able to find a so-called dual certificate, a matrix D 
feasible for (4) for which Tr(i4) = Tt{Yxx'^), then it guarantees that X = xx'^ is 
an optimal solution of (3). To guarantee uniqueness it suffices to further ensure 
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that \ 2 {D — Y) > 0. In fact, if there existed another optimal solution X, by (5), one 
would have Tr [{D — Y)X] = 0 which can be shown to imply (see, for example, [1]), 
together with the feasibility of X, that X = This establishes the following 

Lemma. 

Lemma 2.1. [Dual Certificate] Let Y be a symmetric nxn matrix and x £ {±1}". 
If there exists a diagonal matrix D, such that Tr(H) = x'^Yx, D — Y > Q, and 
X 2 {D — Y) > 0 then X = xx'^ is the unique optimal solution of (3). 

We take a candidate dual certificate D whose diagonal elements are given by 

n 

Da = ^ ^ YijXiXj. 

i=i 

Note that D = ll[diag(a:)Ydiag(a:)] cis per Definition 1.1. It is easy to see that Tr(I?) = 
x^Yx and {D — Y)x = 0 which gives the following Lemma. 

Lemma 2.2. Let Y be a symmetric nxn matrix and x £ {±1}”. Let D be the 
diagonal matrix defined as D = D[diag(a;)Ydiag(a;)] ■ As long as 

X2{D-Y)>Q, 

X = xx'^ is the unique optimal solution of (3). 

Note that these guarantees, (Lemmas 2.1 and 2.2) do not depend on the matrix 
Y or the distribution from which it is drawn. 

Let us return to the setting on which Y = zz’^ + crW, where IT is a standard 
Wigner matrix: a symmetric matrix with iid standard gaussian entries. We want to 
determine for which values of a one excepts X = zz^ to be, with high probability, 
the solution of (3), as we are interested not only to compute the MLE but also for it 
to coincide with the planted vector z we want to recover. Since diag(z)lTdiag(z) ^ 
W we can, without loss of generality, take z = 1. In that case, we are interested in 
understanding when 

^2 — (l 1^+crlT)) > 0. (6) 

Since 

-0[1 I'T +<7W] ~ (l +CrlT) = (ninxn — 1 1^) — O' { — Dw + w) = L^^T — CrL[_w], 

and 1 is always in the nullspace of any Laplacian matrix, it is not difficult to see 
that (6) is equivalent to 



The triangular inequality tells us that Amax (L[_vf]) ^ Amax {—Dw) + ||kL||- It 
is well known that, for any e > 0, ||IT|| < (2 + e)y/ri with high probability (see, for 
example, Theorem 11.11 in [20]). On the other hand, 

Amax ( D\y^ = max [ (I^vf)^^] ? 

iG[ra] 

which is the maximum of n gaussian random variables each with variance n. A 
simple union bound yields that, for any e > 0, Amax < •\/(2 + £)nlogn 

with high probability. This readily implies an exact recovery guarantee for Z 2 
Synchronization with gaussian noise. 
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Proposition 2.3. Let z G {±1}” and Y = zz’^ + aW where W is a symmetric 
matrix with iid standard gaussian entries. If there exists e > 0 such that a < 
then, with high probability, X = zz'^ is the unique solution to the 


(2+e) logn 

Semidefinite Program (3). 


Let us investigate the optimality of this upper bound on a. If the diagonal 
elements of were independent®, their distribution would be known to indeed 

concentrate around •\/2Rlogn, suggesting that 

||W®||«A„,ax(l?[-IV]), (8) 

which would imply 

Amax ~ [1 o(l)] Amax (-^[—VK]) ■ (1^) 

Both of these statements can be rigorously shown to be true. While a simple 
adaptation of the proof of Theorem 3.1 can establish (8) and (9) we omit their proofs 
for the sake of brevity, but emphasize that in this particular setting (where W is a 
standard Wigner matrix), one does not need the whole strength of Theorem 3.1 as 
simple elementary proofs exist. 

This suggests that, in rough terms, the success of the relaxation (3) depends 
mostly on whether Amax (lA[_iv]) < which is equivalent to 


max 

iG[n] 



< n, 


( 10 ) 


which can be interpreted as a bound on the amount of noise per row of Y. We 
argue next that this type of upper bound is indeed necessary for any method to 
succeed at recovering z from Y. 

Once again, let us consider z = 1 without loss of generality. Let us consider 
an oracle version of problem on which one is given the correct label of every single 
node except of node i. It is easy to see that the maximum likelihood estimator for 
Zi on this oracle problem is given by 


sign 

M 

= sign 

n — 1 + a Wij 


_jG[ra]\i 


iG[n]\i 


which would give the correct answer if and only if 


— cr ^ Wij < n — 1. 

je[n]\i 


( 11 ) 


This means that if 


max 

iG[n] 


-a W, 

jGH\i 


> n — 1, 


( 12 ) 


one does not expect the MLE to succeed (with high probability) at recovering z 
from Y = zz"^ + aW. This means that (with a uniform prior on z) no method 
is able to recover z with high probability. Note the similarity between (10) and 
(12). This strongly suggests the optimality of the semidefinite programming based 
approach (3). 


^The diagonal entries of are not independent because each pair of sums shares a term Wij 


as a summand. 
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These optimality arguments can be made rigorous. In fact, in Section 4, we will 
establish precise optimality results of this type, for the applications we are interested 
in. The main ingredient (8) in the rough argument above was the realization that 
the spectral norm of W is, with high probability, asymptotically smaller than the 
largest diagonal entry of Theorems 3.1 and 3.2 establish precisely this fact 

for a large class of matrices with independent off-diagonal entries. Empowered with 
this result, we will be able to establish optimality for the semidefinite programming 
approach to solve the problems of Z 2 Synchronization and recovery in the stochastic 
block model, where the underlying random matrices have much less well understood 
distributions. Modulo the use of Theorem 3.1, the arguments used will be very 
reminiscent of the the ones above. 

It is pertinent to compare this approach with the one of using noncommuta- 
tive Khintchine inequality, or the related matrix concentration inequalities [44, 45] , 
to estimate the spectral norms in question. Unfortunately, those general purpose 
methods are, in our case, not fine enough to give satisfactory results. One illustra¬ 
tion of their known suboptimality is the fact that the upper bound they give for 
11 IT 11 is of order logn, which does not allow to establish (8), a crucial step in 
the argument. In fact, the looseness of these bounds is reflected in the suboptimal 
guarantees obtained in [1, 2, 3]. Our results are able to establish a phenomenon 
of the type of (8) by relying on recent sharp estimates for the spectral norm of 
matrices with independent entries in [10]. 

3. Main Results 

We use this section to formulate precise versions of, and briefly discuss, our main 
results. 

Theorem 3.1. Let L be annxn symmetric random Laplacian matrix (i.e. satisfy¬ 
ing LI = 0) with centered independent off-diagonal entries such that 
is equal for every i. 

Define a and (Too as 


cr'^ = crL =max||Tiil[L- 

- r i\ • 


If there exists c > 0 such that 

1 

a > c(logn)= (Too, 


(13) 


then there exists ci, C\, Pi, all positive and depending only on c, such that 



with probability at least 1 — cin 


Even though we were not able to find a convincing application for which was 
asymptotically growing but slower than -^logn, we still include the theorem below 
for the sake of completeness. 

Theorem 3.2. Let L be an nxn symmetric random Laplacian matrix (i.e. satisfy¬ 
ing LI = 0) with centered independent off-diagonal entries such that 
is equal for every i. 
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Define a and CToo as 

= max||Ly ||^ . 

- r i\ • 

jeln]\i 


If there exist c and 7 > 0 such that 

O'> C (log n) (Too, (14) 

then there exist C 2 , C 2 , e and j32, all positive and depending only on c and 7 > 0 , 
such that 


Amax(l^) ^ 

with probability at least 1 — C 2 exp 



C 2 \ 
(logn)V 


-(logn)^= 


max La, 

i 


Remark 3.3. In the theorems above, the condition that X)jG[n]\i equal for 

every i, can be relaxed to the requirement that 

(fa^< 

je[n]\i 

for all i. This requires only simple adaptations to the proofs of these theorems. 


While we defer the proof of these theorems to Section 5, we briefly describe its 
idea. Lemma 5.1 (borrowed from [10]) estimates that 

ll-^ll ^ o- + (Too\/log n, 

where —X is the off-diagonal part of L. One the other hand, La = 
has variance cr^ and the Central Limit Theorem would suggest that La behave like 
independent gaussians of variance cr^, which would mean that max^ La 00 cr-^logn 
rendering the contribution of the off-diagonal entries (to the largest eigenvalue) neg¬ 
ligible. However, several difficulties arise: the diagonal entries are not independent 
(as each pair shares a summand) and one needs to make sure that the central limit 
theorem behavior sets in (this is, in a way, ensured by requirements (13) and (14)). 
The proofs in Section 5 make many needed adaptations to this argument to make 
it rigorous. 


4. Applications 

We now turn our attention to applications of the main results. As a form of 
warm-up we will start with understanding connectivity of Erdds-Renyi graphs. 

4.1. Connectivity of Erdos—Renyi graphs. Recall that, for an integer n and an 
edge probability parameter 0 < p < 1, the Erdds-Renyi graph model [24] 9{n,p) is 
a random graph on n nodes where each one of the ( 2 ) edges appears independently 
with probability p. 

We are interested in understanding the probability that G, drawn according to 
9{n,p), is a connected graph. We will restrict our attention to the setting p < 5 . 
Let L be the Laplacian of the random graph, given hy D — A where A is its 
adjacency matrix and D a diagonal matrix containing the degree of each node. It 
is well-known (see, e.g., [18]) that G connected is equivalent to X 2 {L) > 0. 
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It is clear that if G has an isolated node then it cannot be connected. It is also 
known that for there not to be isolated nodes one needs the average degree of each 
node to be at least logarithmic [24]. For this reason we will focus on the regime 

plogu 

P = -, 

n 

for a constant p. It is easy to establish a phase transition on the degrees of the 
nodes of graphs drawn from S{n,p). 

Lemma 4.1. Let n be a positive integer, p a constant, and p = ^ . Let G be a 

random graph drawn from S(n,p), then for any constant A > 0; 

(1) If p > 1 then, with high probability, min^gj^j deg(i) > lEdeg(z). 

(2) If p < 1 then, with high probability, minjg[„] deg(i) = 0. That is, G has at 
least one isolated node, thus being disconnected. 

Part (2) of the Lemma is a classical result [24], a particularly simple proof of it 
proceeds by applying the second moment method to the number of isolated nodes 
in G. For the sake of brevity we will skip those details, and focus on part (1). The 
main thing to note in part (1) of Lemma 4.1 is that the lower bound on minimum 
degree is asymptotically smaller than the average degree Edeg(i). 


Proof, [of part (1) of Lemma 4.1[ 

Let p — Pl 2 sjj_ j denote a node of the graph, note that Edeg(i) = ^^^^plogn. 
We use Chernoff bound (see, for example. Lemma 2.3.3 in [23[) to establish, for any 
0 < t < 1, 


’[deg(j) < tEdeg(i)] < 


exp(-(l-t))l'''''=s« 


P 

exp(-(l - 1)) 




- p log n 


= exp 


Tl — 1 

- [1 - f - tlog{l/t)] -plogn 

n 


Taking t - 


/log n 


gives, for n large enough (so that t < 1), that the probability 


that deg(i) < E deg(j) is at most 


exp 


1 - 


A 


:l0g 




n — 1 


plogu 


\/logn ^y\ogn 

which is easily seen to be exp [—plogn + 0(-v/lognloglogu)]. A simple union 
bound over the n vertices of G gives 

A 


min deg(i) < —== E deg(j) 
ieH vlogn 


< exp 


-{p - 1) logn + 0{\/logn\og logu) 


□ 

Using Theorem 3.1 we will show that, with high probability, as long as every 
node in G is at least of the average degree, for a suitable A, then G is 

connected. This is made precise in the following Lemma. 


Lemma 4.2. Let n > 2 be an integer and e > 0. Suppose that <p<\ and 

G a random graph drawn from S(ri,p). There exists a constant A such that, with 
high probability, the following holds: 
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If 

mindeg(i) > Edeg(i), 

le [n] V log n 

then G is a connected graph (note that the right hand side does not depend on i). 

Before proving this Lemma, we note that Lemmas 4.1 and 4.2 immediately imply 
the well known phase transition phenomenon. 

Theorem 4.3. Let n be a positive integer and p = 

(1) If p > 1 then, with high probability, a random graph drawn from Q{n,p) is 
connected. 

(2) If p < 1 then, with high probability, a random graph drawn from 5{n,p) has 
at least one isolated node, thus being disconnected. 

While this phase transition is well understood, we find our proof through Lem¬ 
mas 4.1 and 4.2 enlightening, as it provides a simple explanation of why the phase 
transition for disappearance of isolated nodes coincides with the one for connectiv¬ 
ity. Moreover, it also emphasizes a connection with the optimality of the semidef- 
inite relaxations in both Z 2 Synchronization and the Stochastic Block Model that 
we will discuss in the sections to follow. 

Proof, [of Lemma 4.2] 

Let L be the graph Laplacian of G. Note that E(L) = npl — p 11^, which means 
that 

L = npl — pll^ — [—L + E(L)] 

Since L 1 = 0, it is easy to see that G is connected if and only if 


Amax [-L + E(L)] < np 
We proceed by using Theorem 3.1 for 

L = -L + E{L). 

The hypotheses of the Theorem are satisfied as the off-diagonal entries of L are 
independent and 

ieHV 

This guarantees that there exists a constant Ci such that, with high probability. 

Cl 


Amax [~L E(L)] < I 1 -|- 


Vlogn 

where deg(z) = La is the degree of node i. Equivalently, 

Cl 


max [— deg(i) -\- {n — \)p] 

iGln] 


(15) 


Amax [~L -|- E(L)] < np -|- I 1 -l- . - 

V Vlogn 

This means that, as long as (15) holds, then 
Cl 


— min deg(i) + {n — l)p 

iG[Ti] 


— np 


1-h 


— min deg(i) -I- (n — l)p 

[n] 


— np < 0 


y/logn^ 

implies the connectivity of C. Straighforward manipulations show that this condi¬ 
tion is equivalent to 
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which is implied by 


C\ 

mindeg(z) > np - — - p, 

j ^\ogn + Ci 


inindeg(i) > np 
i V log n 


The lemma follows by taking A = 2Ci. 


(16) 

□ 


4.2. Synchronization over the gronp of two elements. Recall the setting of 
'Ll Synchronization [1, 2]. Given an underlying graph G with n nodes, the task 
is to recover a binary vector z G {± 1 }" from noisy measurements Ty of ZiZp 
Following [1, 2] we will take the underlying graph G to be an Erdos-Renyi graph 
9{n,p) and, for each edge {i,j) G G, 


j ZiZj with probability 1 — e 

1 —ZiZj with probability e. 


where e < i represents the noise level. We are interested in understanding for 
which values of p and e is it possible to exactly recover z. It is easy to see that, 
just like in the example in Section 2, the maximum likelihood estimator is given 
by (1). Similarly, we consider its semidefiiiite relaxation (3) and investigate when 
X = zz^ is the unique solution of (3). 

It is easy to see that Y is given by 


Y = diag(z) {Aq - 2Ah) diag(z). 


where Ac is the adjacency matrix of the underlying graph and Ah is the adjacency 
of the graph consisting of the corrupted edges. In this case we want conditions on 
e and p under which zz^ is the unique solution to: 


max Tr [diag(z) {Aq — 2,Ah) diag(z)A] 
s.t. X,, = 1 (17) 

XYQ. 

Lemma 2.2 states that zz^ is indeed the unique solution as long as the second 
smallest eigenvalue of 

-DAG- 2 Ajj-diag(z) {Ag - 2Ai/)diag(z) = Dg-2Dh- dia,g{z) (Aq - 2^i/)diag(z) 

(18) 

is strictly positive. As diag(z) {Dq — 2Dh) diag(z) = Dq — 2Dh and conjugating 
by diag(z) does not alter the eigenvalues, the second smallest eigenvalue of (18) 
being strictly positive is equivalent to 

A 2 {Dg — Ag — 2 {Dh — Ah)) > 0. (19) 

Since Dg — Ag — 2 {Dh — Ah) = Lg — 2Lh, where Lg and Lh are the Laplacians 
of, respectively, G and iL, we define Lsynch and write the condition in terms of 
Asynch- 

Definition 4.4. [Lsynch] In the setting described above, 

Asynch = Lg — 2Lh, 

where G is the graph of all measurements and H is the graph of wrong measure¬ 
ments. 
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Then, (19) is equivalent to A 2 (Tsynch) > 0. The following Lemma readily follows 
by noting that E [Lsynch] = np{l - 2 e)/„xn - p(l - 2 e) 1 

Lemma 4.5. Consider the Z 2 Synchronization problem defined above and Lsynch 
defined in Definition f.f. As long as 

-^max ( Lgynch “t [Lgynch]) ^ np(l 2^), 

the Semidefinite program (17) achieves exact recovery. 

In [1, 2], this largest eigenvalue is estimated using the general purpose matrix 
concentration inequalities (such as the ones in [44]) obtaining a suboptimal bound. 
In contrast, we will do this estimate using Theorem 3.1. 

Let us define, for a node f, deg_(_(z) as the number of non-corrupted edges incident 
to i and deg_(i) as the number of corrupted edges incident to i. We start by 
obtaining the following theorem. 

Theorem 4.6. As long as n > 2, p > and p{l — 2e)^ < there exists A > 0 
such that, with high probability, the following holds: If 

mill [deg+(f) - deg_(z)] > E [deg+(*) - deg_(*)] , (20) 

JG[n] VfOgn 

then the semidefinite program (17) achieves exact recovery. 


Proof, [of Theorem 4.6[ 

The idea is to apply Theorem 3.1 to L = —Lgynch + E [Lgynchj- Note that L has 
independent off-diagonal entries and 


E E[Ly = 


> 


{n- 1) (p-p^(l - 2e)2) > inp > ^logn 


l+p{l-2e) logn II 2 

• log n = — - — max 11L,- 


8(1 -t \/2) 


8{l + y/2) iA3 


Hence, there exists a constant A' such that, with high probability, 

/ A' \ 

^max ( Lgynch T E [Lgynch]) ^ ( 1 T > , ) max [ (Lgynch)!! T E [(Lgynch)2z]] • 

V Vlogu/ igN 

We just need to show that, there exists A > 0 such that, if (20) holds, then 


1 -b 


V^og nj 


max [ (-t/Synch)^^ ^ [(^Synch)^^]] ^ Tipi^l 
iG[n] 


2 £). 


( 21 ) 


Recall that (Lgynch)*i = deg+(i) - deg_(i) and E(Lgynch)*i = {n - l)p{l - 2e). 
We can rewrite (21) as 

/ A' 

min(Lgynch)M > (n - l)p(l - 2 e) - np{l - 2 e) 1 -b - 

iG[n] \ vfogu 

Straightforward algebraic manipulations show that there exists a constant A 
such that 

{n - l)p(l - 2e) - np{l -2e) ^1 ^ E [deg+(f) - deg_(i)] , 

proving the Theorem. 

□ 
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We note that, if p < ° 2 ^ , then Theorem 4.3 implies that, with high probability, 
the underlying graph is disconnected implying impossibility of exact recovery. We 
also note that if we do not have 


min [deg+(*) - deg_(i)] > 0, (22) 

iG[n] 

then the maximum likelihood does not match the ground truth, rendering exact 
recovery unrealistic®. The optimality of this analysis hinges upon the fact that the 
right-hand side of (20) is asymptotically smaller than the expectation of deg_|_(*) — 
deg_(i), suggesting that (20) and (22) have similar probabilities and the same phase 
transition. 

The next Theorem establishes the optimality of the semidefinite programming 
based approach in a particular regime, solving a problem raised in [1, 2]. While it 
is clear that one can use Theorem 4.6 to establish similar results for many other 
regimes (for some, through estimates similar to the ones in Lemma 4.14), the main 
purpose of this paper is not to perform a detailed analysis of this problem but 
rather to illustrate the efficacy of these semidefinite relaxations and the fundamen¬ 
tal connections between these different phenomena, through Theorem 3.1. The 
independent parallel research efforts of Hajek et al. [29] address other regimes for 
this particular problem, we refer the interested reader there. 


Corollary 4.7. As long as e < ^ and p(l — 2e)^ < there exists a constant K 
for which the following holds: If there exists 5 > 0 such that 


(n - l)p > (1 -I- (5) 


( 1-2 6 )^ 


1 + 7 ^ + |( 1 - 2 £) 

V fog 3 


logn. 


(23) 


then the Semidefinite program (17) achieves exact recovery with high probability. 


Before proving this corollary we emphasize how it solves the problem, raised 
in [1, 2], of whether the semidefinite programming approach for Z 2 Synchronization 
is optimal in the low signal-to-noise regime. In fact, the results in [1, 2] ensure that 
the threshold in Corollary 4.7 is optimal for, at least, an interesting range of values 
of e. Empowered with Theorem 4.6, the proof of this corollary becomes rather 
elementary. 


Proof, [of Corollary 4.7[ 

This corollary will be established with a simple use of Bernstein’s inequality. 
Our goal is to show that, given A, there exists a K and 6 such that, under the 
hypothesis of the Corollary, 

mm [deg+(i) - deg_(i)] > ^_ E [deg+(ii) - deg_(i)] , 

ie [n] y log n 

holds with high probability. This implies, via Theorem 4.6, that the semidefinite 
program (17) achieves exact recovery with high probability. 

We will consider n to be large enough. We start by noting that it suffices to 
show that there exists 5 > 0 such that, for each i G [n] separately. 


deg+(i) - deg_(i) < 


A 




:E [deg+(i) - deg_(z)] 


(24) 


^Recall that, if we assume a uniform prior, the MLE is the method that maximizes the prob¬ 
ability of exact recovery 
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Indeed, (24) together with a union bound over the n nodes of the graph would 
establish the Corollary. 

Throughout the rest of the proof we will fix I G [n] and use deg_|_ and deg_ to 
denote, respectively, deg_|_(I) and deg_(i). It is easy to see that 

n—1 

deg+ - deg_ = {n- I)p(I - 2£) - ^ a;^, 

1=1 

where Xj are i.i.d. centered random variables with distribution 


Xj = 


— l+p(l — 2 s) with probability p{l — e) 

l+p(l —2 s) with probability pe 

p{l — 2e) with probability 1 —p- 

For any t > 0 Bernstein’s inequality gives 


n—1 

J2x3>t 

1=1 


< exp — 


t^/2 


{n-l)Ex^ + |||xj||oo 


Taking t = 


1 - 


/log n 


^ (n — l)p(l — 2 s) gives 
A 


deg+ - deg_ < 


E [deg+ - deg_] 


< exp 


1 - 


A 


/log n 


(n-l)p(l-2s))'/2 


y {n - l)Ex^ ' 


= exp 


yiog rt 


("-i)p(i-2e)) 


[l A 

(n — l)p(l — 2 s )^/2 

\/log n 

iEx?+ ^ 

1 * 

^ x/logn 



Condition (23) (for a IF to be determined later) guarantees that 


(n-l)p(l-2s)V2> (1 + 5) 


1 + 7^ + ^(1-2s) 

ylog n 3 


logn. 


meaning that we just need to show that there exists if > 0 for which 

n 2 

1 


A 


/log n 




K 


, - , |(l-2s 

log n o ^ 


1 - 


\/log ri 


( 1 - 2 +) 


> 1 . 


Note that = l+p(l — 2s) < 1 + (1 — 2s) and 

implying that 


= l+p(l-2s) < 2, 


- Ea;? + 


1 - 


A 


/log n 


(l-2s)) 


lla^i||oo A 1 + „ (1 2s). 


Also, 


> 1 - 


\/log n 

exists K > 0 such that 


■ The corollary is then proved by noting that there 


> 2K 


\/iog 


2A 


logn \/Iogn 


l + -(l-2 s)). 
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□ 


4.3. Stochastic Block Model with two communities. We shift our attention 
to the problem of exact recovery of the stochastic block model with two commu¬ 
nities. Recall Definition 1.2, for n even and 0 < g < p < 1, we say that a graph 
G with n nodes is drawn from the Stochastic block model with two communities 
9{n,p,q) if the nodes are divided in two sets of ^ nodes each, and for each pair 
of vertices i,j, ihj) is an edge of G with probability p if * and j are in the same 
cluster and q otherwise, independently from any other edge. Let g G {il}" be a 
vector that is 1 in one of the clusters and —1 in the other, our task is to recover g. 

The maximum likelihood estimator for g is given by 


(25) 


max 

s.t. X S K" 
xf = 1, 

ELiXi = o, 

where B is the signed adjacency of G, meaning that Bij = 1 if {i,j) is an edge 
of G and By = —1 otherwise. Note that B = 2A — (l 1^ where A is the 
adjacency matrix. We will drop the balanced constraint X)r=i ** = arriving 
at (1) for Y = B. The intuitive justification is that there are enough —1 entries 
in B to discourage unbalanced solutions. As in the problems considered above, we 
will consider the semidefinite relaxation (3). 

max Tr[(2A-(11^-/)) a] 

s.t. Xu = 1 (26) 

XYQ. 


We want to understand when is it that X = is the unique solution of (26). 
Lemma 2.2 shows that pp^ is indeed the unique solution of (26) as long as the 
second smallest eigenvalue of 

^[diag( 3 )( 2 A-(l -/))diag(g)] ~ [2A — (l 1 —.f)] , (27) 

is strictly positive. 

Let us introduce a new matrix. 


Definition 4.8. [Tsbm] Given a graph G drawn from the stochastic block model 
with two clusters, 

Lsbm = ©+ — 23- —A, 

where D+ is a diagonal matrix of inner degrees, D_ is a diagonal matrix of outer 
degrees and A is the adjacency matrix of the graph. 

It is easy to see that B[(jiag(g)Adiag(g)] = 23+ — R_. In fact, 

23[diag(g)(2A-(l -/))diag(g)] “ [2A “ (l -l)] = 2rsBM + 1 1^, 

which means that pp^ is the unique solution of (26) as long as A 2 (2rsBM + 11^) > 

0 . 

Note that 

E[2rsBM + llT = 

= 9) (/„x» - ^^)+9(1 - (p+9)) 

\ n J n 
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If we suppose that p < we have 1 — {p + q) > p — q the second smallest 
eigenvalue of E [2r sbm + 11^] is n{p — q). This establishes the following Lemma. 

Lemma 4.9. Let n > A be even and let G he drawn from G(ri,p^q) with edge 
probabilities p < ^ and q < p. As long as 

Ti 

Amax ( —LsBM + E [Lsbm]) < -^{p — Q), 

the Semidefinite program (26) for the stochastic block model problem achieves exact 
recovery, meaning that gg^ is its unique solution. 

Estimating this largest eigenvalue using Theorem 3.1, we obtain the following 
theorem. 


Theorem 4.10. Let n> A be even and let G be drawn from G{n,p,q). As long as 
< p < A and q < p, then there exists A > 0 such that, with high probability, 
the following holds: If, 

A 

min (deg,„(z) - deg„,,t(j)) > - E [deg„(7) - dego,,t(z)] (28) 

I V mg ^ 

then the semidefinite program (26) achieves exact recovery. 


Proof. The idea is again to apply Theorem 3.1. One obstacle is that Tsbm is not a 
Laplacian matrix. Let g denote the vector that is 1 in a cluster and —1 in the other, 
and let diag(p) denote a diagonal matrix with the entries of g on the diagonal. We 
define 

r'sBM = diag(5)rsBMdiag(5). 

Note that T^gj^ is a Laplacian and both the eigenvalues and diagonal elements of 
® Psbm] ~ Tsbm ^^'6 ^^e as E [Tsbm] — Lsbm- 

We apply Theorem 3.1 to L = —Tggj^ +E [Tggj^]. Note that L has independent 
off-diagonal entries and 

logn 


E EK] = G-i)(p-0 + f 

logn logn II 2 .. 

> (1 — q) = max ||LL 


> 


24 


24 24 i^3 

Hence, there exists a constant A' such that, with high probability, 

A' \ 

max [—(Egg^)!! + E [(rgg]y[)ii]] 


Amax ( —r'sBM + E [HggM]) < ( 1 + 
which is equivalent to 
^max (—Lsbm + E [Tsbm]) < ( 1 + 


\/\ 0 gn) IG[ra] 
A' 


max [—(rsBM)^ + E [(rsBM)ii]]. (29) 


■\/log n J iG[n] 

We just need to show that, there exists A > 0 such that, if (28) holds, then 

A' 

1 -b 


-, max [-(rsBM)« + E [(Tsbm)**]] < ■x(p - ?) - P- (30) 

ylogn/ iG[n] 2 

Note that (Tsbm)!! = degj„(i) - dego„t(*) and 

Ti 

E [deg,„(i) - dego,,t(f)] = -(p-q)-p. 
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Condition (28) can thus be rewriten as 


max [—(rsBM)^ + E [(rsBM)ii]] < 

i&[n] 


A 

Viogn 


(|(p-g)-p). 


The Theorem is then proven by noting that, for any A', there exists A such that 


1 - 


A 


y/\ogn_ 


(^{p-q)-p) < 


1 + 


Vlogn_ 


(|(p-g)-p). 


□ 

As a corollary of this theorem we can establish a sharp threshold for exact 
recovery for the stochastic block model of two clusters solving a problem posed 
in [3] . We recall that this problem was simultaneously solved by the parallel research 
efforts of Hajek et al. [28]. 

We first show a Lemma concerning min^ (degj„(i) — dego„((i)), analogous to 
Lemma 4.1. 


Lemma 4.11. Let G be a random graph with n nodes drawn accordingly to the 
stochastic block model on two communities with edge probabilities p and q. Let 
p = and q = where a > /3 are constants. Then for any constant 

A > 0, ” 

(V If 

^/^-V^>\/2, (31) 

then, with high probability, 

min (deg„(i) - deg„„i(i)) > E [deg,„(*) - dego„t(i)]. 

* V iog 'n 

(2) On the other hand, if 

V^-^/p<V2, (32) 


then, with high probability. 


min (deg„(ii) - deg 

out (^) ) 

I 

and exact recovery is impossible. 


Part (2) is proven in [3], so we will focus on part (1). Before proving this lemma 
we note how, together with Theorem 4.10, this immediately implies the following 
Corollary. 


Corollary 4.12. Let G be a random graph with n nodes drawn accordingly to the 
stochastic block model on two communities with edge probabilities p and q. Let 
p = and q = ^ , where a > [3 are constants. Then, as long as 

(33) 

the semidefinite program (26) coincides with the true partition with high probability. 

In order to establish Lemma 4.11 we will borrow an estimate from [3]. 

Definition 4.13. [Definition 3 in [3][ Let m be a natural number, p,q G [0,1], and 
i5 G M, we define 

m 

J2{Z,-W,)>S , 

.2=1 


T{m,p,q,S) = P 
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where are i.i.d. Bemoulli(p) and Zi,..., Zm are i.i.d. Bemoulli(g), 

independent of Wi ,. •., Wm ■ 

Lemma 4.14. Recall Definition 4-13. Let a, (3, and A' he constants. Then, 




2 ’ n 


with lim 6(n) = 0. 

n—>-oo 


exp 


log n 


Proof. The proof of this Lemma is obtained by straightforward adaptations to the 
proof of Lemma 8 in [3]. 

□ 

We are now ready to prove Lemma 4.11. 

Proof, [of Lemma 4.11] 

Let a > fi he constants satisfying condition (32). Given A > 0, we want to show 
that, with high probability 


/\ 

min (deg„(i) - deg„„t(i)) > - -{p - q). 

I V ^ ^ 

Let us fix i throughout the rest of the proof. It is clear that we can write 


(34) 


1-1 


1/2 


t/2 


degi„(i) - deg„,,i(i) = X! H = - Z,) + Z^, 


where Wi,...,Wm are i.i.d. Bernoulli(p) and Zi,...,Zm are i.i.d. Bernoulli(( 7 ), 
independent of Wi,..., Wm- Hence, since 

A 


\/logn 


(^(p-9)) = A Clog n 


the probability of degi„(i) - dego„t(t) < (f (p - q)) is equal to 


t/2 


^ (z, - w) - z. > -aCI^ 


i=l 


a — P 


which is upper bounded by, 

'n/2 


{Zi - Wi) > -ACIo^ 


i=l 


a — P 


Take A' = A 


recall Definition 4.13, then 

/\ Jl 

P degi„(z) - dego„t(i) < -j==-{p - q) 

<T(",AlI!,lhSI!,_A'v1^) 

\2 n n ) 

< exp - - 5{nU\ogn 


where lim„_>i„jty S{n) = 0, and the last inequality used Lemma 4.14. 
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Via a simple union bound, it is easy to see that 
P min(degi„(i) - dego„t(i)) < 


An ' 


\/\ogn 2 


< exp 


- 1 - <5(n)'j log) 


which means that, as long as — ^/a|3 > 1, (34) holds with high probabil¬ 
ity. Straightforward algebraic manipulations show that (31) implies this condition, 
concluding the proof of the Corollary. 


□ 


5. Proof of the main result 


We will prove Theorems 3.1 and 3.2 through a few Lemmas. Let us define 
X as the non-diagonal part of — L and y G M” as y = diag(Zlx), meaning that 
y = diag(L). Then L = Dx — X. We will separately lower bound max^ and 
upper bound ||V||. The upper bound on ||X|| is obtained by a direct application of 
a result in [ 10 ]. 


Lemma 5.1 (Remark 3.13 in [10]). Let X be the n x n symmetric matrix with 
independent centered entries. Then there exists a universal constant c', such that 
for every t > 0 

P[||V|| > 3tT-I-t] < rie“‘ (35) 

where we have defined 


a := max ^'^E[Xfj], 


(Too ■— mS-X |[ II oo ■ 


Before continuing with the proof let us recall the main idea: Lemma 5.1 gives 
that, with high probability, 

ll-^ll ^ cr + o-Qo yiogn, 


where X is the off-diagonal part of —L. One the other hand. La = 
variance tr^. The Central Limit Theorem would thus suggest that La behave like a 
gaussian of variance . Since different sums only share a single summand they are 
“almost” independent which by itself would suggest that max^ La ~ cri/logn, which 
would imply the theorems. The proof that follows makes this argument precise. 

We turn our attention to a lower bound on maxi yi. Recall that yi = 

More specifically, we are looking for an upper bound on 


maxyi < t 


for a suitable value of t. We note that, if the yfs were independent then this 
could be easily done via lower bounds on the upper tail of each yi. Furthermore, 
if the random variable yi were gaussian, obtaining such lower bounds would be 
trivial. Unfortunately, the random variables in question are neither independent 
nor gaussian, forcing major adaptations to this argument. In fact, we will actually 
start by lower bounding 


E max yi. 

iG[n] 
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We will obtain such a bound via a comparison (using Jensen’s inequality) with 
the maximum among certain independent random variables. 


Lemma 5.2. Let 3 and 3 be disjoint subsets of [n]. For i define Zi as 


Then 


Proof. 


Zi — Xij. 

je3 


Emaxwi > Emaxzi. 

zG[n] ieJ 


(36) 


Emaxu,- = Emax > Xu > Emax > Xu. 
i=i 

Since JflU = 0, {Xij}i^jj^g is independent from {Jfy ^ind so Jensen’; 

inequality gives 


E max Xij > E 




max 

IGJ 


jes m 


= E max Xij = E 




max Zi. 
iG3 


□ 

The following Lemma guarantees the existence of sets J and 3 with desired prop¬ 
erties. 

Lemma 5.3. There exist J and 3 disjoint subsets of [n] such that 

and, for every i G J, 

where Zi is defined, as in (36), to be Zi = '^j^g Xij. 

Proof. Given the matrix X, we start by constructing a weighted graph on n nodes 
such that Wij = ^Xfj (note that wu = 0, for al i). Let (5, S^) be a partition of the 
vertices of this graph, with IS"! > that maximizes the cut 


E 


w. 


zj- 


ieS,jeS'= 


It is easy to see that the maximum cut needs to be at least half of the total edge 
weights^. This readily implies 

^ wzj > ^ E = J E E ^ E E ® E- = 

i£S,j€S'^ i<j iG[n]jG[n] *G[n]jG[n] 


Consider Zi, for i G S, defined as 


Zi — Xij. 

jeS'= 


^One can build such a cut by consecutively selecting memberships for each node in a greedy 
fashion as to maximize the number of incident edges cut, see [41]. 
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We proceed by claiming that the set 3 C S' of indices i € S for which 

satisfies | J | > Thus, taking 3 = would establish the Lemma. 

To justify the claim, note that 

1 


> -ncr^, 


ieS 


ieS,jeS‘^ 


and 


< I J|maxEz,2 + (|S| - | J|) < (^|3| + i|S|^ cr^ 

implying that (| J | + |n) cr^ > 

□ 

We now proceed by obtaining a lower bound for EmaxigjZj, where 3 and Zi 
are defined to satisfy the conditions in Lemma 5.3. We note that at this point 
the random variables Zi are independent and each is a sum of independent random 
variables. We use Lemma 8.1 of [31] (for a fixed constant 7 = 1) to obtain a lower 
bound on the upper tail of each z^. 

Lemma 5.4. [Lemma 8.1 of [31[[ In the setting described above, there exist two 
universal positive constants K and e such that for every t satisfying t > and 
t < £ -T =—, we have (for every i G J separately) 

V 0(700 

P [zi > t] > exp f- 8 -^^ . 


We are now ready to establish a lower bound on Emaxjgjji] yi. 

Lemma 5.5. In the setting described above, there exist two universal positive con- 

2 

stants K and e such that for every t satisfying t > and t < e —, we have 
E max j/i > t — (t + ncToo) exp — - 


ie[n] 


exp 




Proof. Let K and e be the universal constants in Lemma 5.4 and t such that 
2 

<t<£ . Lemma 5.4 guarantees that, for any i G 3, 

P [zi > t] > exp ^-8^^ . 

Due to the independence of the random variables Zi, we have 


maxzi < t 
ieJ 


= P [Zi < t] = (1 - P [Zi > t]) 






| 3 | 


rijS 


< 


1 - 


< exp — 


exp (8^) 
n/8 


< 1 - 


exp (8^) _ 


exp (8^)^ 
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where the second to last inequality follows from the fact that | U | > and the last 
from the fact that (l — < exp(—1) for x > 1. 

Since ||Xij||oo < aoo we have that, almost surely, Zi > —{n — 1)(Too. Thus, 


E max Vi > E max Zj > t 

IGN iGJ 


1 — exp 


V exp(8^)y 


-(n-l)croo exp 


n/i \ 

exp (8^)7 ’ 


which establishes the Lemma. 

□ 

The last ingredient we need is a concentration result to control the lower tail of 
maxig[„] yi by controling its fluctuations around Emax^gj^j yi. We make use of a 
result in [33]. 


Lemma 5.6. In the setting described above, define v as 


= E 


zgH ^ 


(37) 


where X' is an independent identically distributed copy of X. 
Then, for any x > 0.' 


max?/i < E 

iG[n] 


maxi/i 

iG[ra] 


— X 


< exp I — 


7{v + aoox) J 


Proof. This Lemma is a direct consequence of Theorem 12 in [33] by taking the 
independent random variables to be such that ^ = Xij ii t = i and 

= 0 otherwise. We note that there is a small typo (in the definition of the 
quantity v) in the Theorem as stated in [33] . □ 

At this point we need an upper bound on the quantity v defined in (37). This is 
the purpose of the following Lemma. 


Lemma 5.7. In the setting above, let X' is an independent identically distributed 
copy of X, then 


E 




< 9cr^ + 90(7^ logn. 


Proof. We apply a Rosenthal-type inequality from Theorem 8 of [13], for each i G [n] 
separately, and get, for any integer p and 0 < (5 < 1, 


n 

< (1 + 5)E 

n 

(5 

max (Wj - Xhf 


p 





< 2{1 + 6)a'^ + (38) 


It is easy to see that 


n 


n 


1 

< UP 

j2{x.,-xhY 

i=i 
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Thus, taking p = [alogn] for some a > 0 gives 


E 


maxY^ — X' 




< n r°log^1 2(1 + J)cr^ + n r°logr.1 ^ 


< 6“ 2(1 + (5)(t^ + 


j. Sfa logn] 


Taking, for example, <5 = 0.5 and a = 1 gives 


E 




< 9cr^ + 90cr^ logn. 


We now collect all our bounds in a master Lemma. 


□ 


Lemma 5.8. In the setting described above, there exist universal constants K > 0 

2 

and e > 0 such that, for any t satisfying <t<e —, we have 


t , . I —n 

msLxyi < - - {t + naoo) exp 

iG[n] 2 


exp(^), 


< exp 


-tVio^ 


0-2 + cr^ logn + (Toot 


Proof. Let t > 0 satisfy the hypothesis of the Lemma, and a; > 0. 
Recall that Lemma 5.6 gives 


max?/i < E 

iG[n] 


max yi 

ie[n] 


— X 


< exp ( — 


7{v + aoox) J 


On the other hand, Lemma 5.5 and 5.7 control, respectively, E [maxjg[„] j/i] and v, 
giving 


E 


max Pi 

iG[n] 


> t — {t + ncToo) exp — 


exp m ; ’ 


and 


V < 9(7^ + 90ct^ logn. 
Combining all these bounds. 


maxyi <t - {t + ncToo) exp- 

ieH \ exp (^) 


— X 


< exp — 


7(9ct 2 + 90(7^ logn + a^ox) J ' 

Taking x = t/2 establishs the Lemma. 

□ 

At this point, the proofs of Theorems 3.1 and 3.2 will consist essentially of 
applying Lemma 5.8 for appropriate values of t. 


Proof, [of Theorem 3.1] 
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Let /3 > 0 be a constant to be defined later. Taking t = (3a^/log n in Lemma 5. 
gives that, in the setting described above, 

maxyi < ^aJlogn — f/3crv^logn + ncioo') exp ) 

i€[n] 2 \ / \ / 

—/3^cr^ log n/lO"* 


< exp 


= exp 


cr 2 + cr^logn + 

^OO iPay/logn) 
^ —/3^1ogn/10^ 

Vl+ (2^)^logn+ 2^/3Vlogn 


= n 


/3^/lQ^ 

log 


provided that < Pa^ylogn < e -y=—, where K and e are the universal constants 
in Lemma 5.8. 

We start by noting that, if 0 < /3 < -^ independent of n, then, for n large 
enough (not depending on a or CToo), 

(^Pa^/logn + na^ exp < ^a^logn. 


Thus, provided that < ^ < min {e i}, 


maxyi < ^a\/\ogn 

i€:[n] O 




<n '°l!"+-^/3VlogrL 


Let c be the constant in the hypothesis of the theorem, then cr > c-\/log naoo ■ 
Let /3 = min | 11 • Clearly, for n large enough. 


K . r ec 1 1 . f cr 

8-v/log n I \/8 3 J \ ysToguCToo 3 




and 


maxyi < min|^^, ^ \ ayj\^i 




< n 


x-|^ 12^ 12 ,9^ + max-|^ ,9c^ ^ + max^ .3c^ 


This implies that there exist constants and such that 


maxLii < C[(j^\ogn 

iG[n] 


^ r-l' 


Recall that Corollary 5.1 ensures that, for a universal constant o', and for every 
w > 0, by taking t = ua, 

P[||X|| > (3 + u)a] < (39) 

It is easy to see that < 7 je““^('°s")c/c' _ ^i-«^c/c'^ Taking u = 

\/2d jc gives 


IX 


||> (3+v^) 


< n 
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This means that, with probability at least 1 — — n ^ we have 

||X|| < (3 + v' 2 c'/c) CT < maxL.i, 

V / C^vlogn zG[n] 

which, together with the fact that Amax(T) < ||-^|| + maxjgjjj] La, establishes the 
theorem. 

□ 


Proof, [of Theorem 3.2] 

If CT > -v/log naoo then the result follows immediately from Theorem 3.1. For 
that reason we restrict our attention to the instances with a < -y/log nOao ■ We start 
by setting 


t = 2a 


(logn)^ 


(40) 


Recall that there exist c and 7 > 0 such that a > c (log n) * CToo , or equivalently 

> c (logn)^'*’"’'. 


a 

(^00 


This guarantees that, for n large enough (not depending on a or CToo), the con¬ 
ditions in Lemma 5.8 are satisfied. In fact, 

^2 


Ka „ si+A „ / a ,, ,1 sa I a ^ .i+a £cr" 

—— < 2cr-v/c(logn)^ = < 2a. -(logn)® < —pW- ^/c{\ogn)^ = < 

o y O'oo v 8 V ^00 

Hence, Lemma 5.8 gives, for t as in (40), 


•\/ 8 o 


t , I —n 

max j/i < - — (t + naoo) exp 
iGfnl 2 


< exp 


-t2/l04 


0-2 -I- cr^ logn -I- (Toot J 


,exp(^). 

We proceed by noting that, for t = 2a ^ (logn )5 and n large enough (not 

depending on a or (Too), 

t 


(t + naoo) exp 


< 


In fact, since a < (Too\/logn, 

/ 


exp 


,exp(|^);- 6 - 


—n 




exp 


< exp 


exp (32(logn)^/"‘) 




decreases faster than any polynomial. 

_ i _i_.2 

Hence, since t > 2(Ti/c (log n) ^ ^, 


maxyi < -a^/c{logn)^'^ ^ 
iG[n] 3 


< exp 


- 2a 


fc)'(log«)0 /lO 




(^lo^ogn + aoo2a (^-^y (logn)* 
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We proceed by noting that 


2cr 


AO 


4(logn)VlA 


cr^ + logn + (Too2fT ^ (logn)8 a { a ) log 71 + 2 ( “ ) ^ (log n) 8 

Since — < i(logn)“ have that, for n large enough and a constant c" 


maxyt < -(Tv^(logn)' 

O 


< exp (—c"(log n)'*'). 


At this point we upper bound ||Ar||, as in the proof of Theorem 3.1. Recall, as 
in (39), for any u > 0, 


P[||Xjl > (3 + u)a] < ne 


Hence, 

P[||X|| > (3 + u)ct] < ne-AA(iog(n))i+=-_ 

Taking u = (log n) ^ gives 

P[||A:|| > (3 + (logn)^) a] < e-A0og(»))=\ 

The rest of the proof follows the final arguments in the proof of Theorem 3.1. 

□ 


6. Conclusion and future directions 

Theorems 3.1 and 3.2 are valid for matrices whose entries may be distributed in 
very different ways. This potentially allows one to use them in order to obtain strong 
guarantees for deterministically censored versions of the problems described, where 
the measurements are obtained only for edges of a deterministic graph (a similar 
model was studied, for example, in [1]). 

The problem of recovery in the stochastic block model with multiple balanced 
clusters, also referred to as multisection, is a natural generalization of the one 
considered here and also admits a semidefinite relaxation. While the results here 
do not seem to be directly applicable in the analysis of that algorithm, in part 
because the construction of a dual certificate in that setting is considerably more 
involved, some of the ideas in the present paper can be adapted for the estimates 
needed there. These also provide interpretable, and sharp, guarantees. We refer 
the interested reader to [4]. 

Regarding directions for future investigations, from the random matrix side of 
things it would be interesting to investigate what happens when a ^ CToo but 
= o^(logn)3^, as this setting is not captured by our results. R would be 
particularly interesting also to understand whether analogues of these results exist 
for instances where the off-diagonal entries of L are not independent ®. 


®For the particular example of connectivity of an Erdos—Renyi graph, it is possible to use the 
matrix concentration approach [44, 45] to obtain a guarantee that, while being a factor away from 
optimal, appears to be adaptable to instances where edges have particular types of dependencies 
— we refer the reader to Section 5.3. in the monograph [45]. 
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From the point of view of applications, a natural question is which other semi- 
definite relaxations have these optimality guarantees. A general understanding in 
that direction would be remarkable. 
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